Skip to content

feat(scan_layers): verify scan_layers compatibility from checkpoint metadata#4304

Open
RexBearIU wants to merge 1 commit into
mainfrom
jackyf/proactive-scan-layers
Open

feat(scan_layers): verify scan_layers compatibility from checkpoint metadata#4304
RexBearIU wants to merge 1 commit into
mainfrom
jackyf/proactive-scan-layers

Conversation

@RexBearIU

@RexBearIU RexBearIU commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Description

Note

This PR is based on #4269.

This PR introduces proactive validation of scan_layers configuration compatibility when loading checkpoint metadata in MaxText.

Key changes:

  • Save scan_layers in Custom Metadata: Updated save_checkpoint in src/maxtext/common/checkpointing.py to write the scan_layers configuration parameter to checkpoint custom metadata.
  • Proactive Verification: Added verification inside the from_pretrained function in src/maxtext/utils/model_creation_utils.py to compare the saved scan_layers value against the running configuration, raising a descriptive ValueError on mismatch to prevent silent model structure mismatches.
  • Decoupled LoRA Unit Tests: Updated tests/post_training/unit/lora_utils_test.py to assert specifically on the "lora" key in custom_metadata instead of full dict equality, decoupling LoRA assertions from global metadata additions.
  • Unit Testing: Added dedicated test cases in tests/unit/model_creation_utils_test.py to verify matching, mismatching, and missing scan_layers metadata scenarios.

Tests

Tested this change by running the unit tests for both model_creation_utils_test.py and lora_utils_test.py under the CPU platform:

PYTHONPATH=src pytest tests/unit/model_creation_utils_test.py tests/post_training/unit/lora_utils_test.py

Output: 74 passed

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@RexBearIU RexBearIU changed the base branch from jackyf/lora-ckpt-metadata to main June 30, 2026 10:48
@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/common/checkpointing.py 83.33% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@RexBearIU RexBearIU force-pushed the jackyf/proactive-scan-layers branch 2 times, most recently from 0330b0a to c0718c2 Compare July 2, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant